AAAI.2024 - Undergraduate Consortium

Total: 20

#1 LLM-Powered Synthetic Environments for Self-Driving Scenarios [PDF1] [Copy] [Kimi]

Author: Oluwanifemi Adebayo Moses Adekanye

This paper outlines a proposal exploring the potential use of Large Language Models (LLMs), particularly GPT-4, in crafting realistic synthetic environments for self-driving scenarios. The envisioned approach involves dynamic scene generation within game engines, leveraging LLMs to introduce challenging elements for autonomous vehicles. The proposed evaluation process outlines assessments such as realistic testing, safety metrics, and user interaction, aiming to set the stage for potential improvements in self-driving system performance. The paper aims to contribute to the AI field by discussing how LLMs could be utilized to create valuable testing grounds for autonomous vehicles, potentially fostering the development of more robust self-driving technology. The envisioned impact is the eventual enhancement of road safety and the possible acceleration of the adoption of autonomous vehicles, paving the way for a future with safer and more efficient transportation.

#2 Integrating Neural Pathways for Learning in Deep Reinforcement Learning Models [PDF] [Copy] [Kimi]

Author: Varun Ananth

Considering that the human brain is the most powerful, generalizable, and energy-efficient computer we know of, it makes the most sense to look to neuroscience for ideas regarding deep learning model improvements. I propose one such idea, augmenting a traditional Advantage-Actor-Critic (A2C) model with additional learning signals akin to those in the brain. Pursuing this direction of research should hopefully result in a new reinforcement learning (RL) control paradigm that can learn from fewer examples, train with greater stability, and possibly consume less energy.

#3 Evaluating AI Red Teaming’s Readiness to Address Environmental Harms: A Thematic Analysis of LLM Discourse [PDF] [Copy] [Kimi]

Author: Amy Au

This research explores the discourse surrounding red teaming and aims to identify any themes in the online discussion of potential environmental harms stemming from Large Language Models (LLMs). Focusing on the AI Red Teaming event at DEFCON 31, this study employs reflexive thematic analysis on diverse social networking site sources to extract insights into public discussion of LLM red teaming and its environmental implications. The findings intend to inform future research, highlighting the need for responsible AI development that addresses environmental concerns.

#4 Enhancing Healthcare Predictions with Deep Learning Models [PDF] [Copy] [Kimi]

Author: Adam Baji

This study leverages Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to enhance diagnostics and predictions in healthcare. By training on extensive healthcare datasets, this project aims to improve early disease detection and health risk assessments. Evaluation emphasizes accuracy, reliability, and ethical considerations, including bias mitigation. This research promises to bridge AI advancements and clinical applications, offering significant improvements in diagnostic capabilities and healthcare accessibility.

#5 Securing Billion Bluetooth Devices Leveraging Learning-Based Techniques [PDF] [Copy] [Kimi]

Author: Hanlin Cai

As the most popular low-power communication protocol, cybersecurity research on Bluetooth Low Energy (BLE) has garnered significant attention. Due to BLE’s inherent security limitations and firmware vulnerabilities, spoofing attacks can easily compromise BLE devices and tamper with privacy data. In this paper, we proposed BLEGuard, a hybrid detection mechanism combined cyber-physical features with learning-based techniques. We established a physical network testbed to conduct attack simulations and capture advertising packets. Four different network features were utilized to implement detection and classification algorithms. Preliminary results have verified the feasibility of our proposed methods.

#6 Flow-Event Autoencoder: Event Stream Object Recognition Dataset Generation with Arbitrary High Temporal Resolution [PDF] [Copy] [Kimi]

Author: Minghai Chen

Event camera has unique advantages in high temporal resolution and dynamic range and has shown potentials in several computer vision tasks. However, due to the novelty of this hardware, there’s a lack of large benchmark DVS event-stream datasets, including datasets for object recognition. In this work, we proposed an encoder-decoder method to augment event stream dataset from image and optical flow with arbitrary temporal resolution for object recognition task. We believe this proposed method can be generalized well in augmenting event stream vision data for object recognition and will help advance the development of event vision paradigm.

#7 Revolutionizing Education through AI-Powered Inclusive Learning Systems [PDF1] [Copy] [Kimi]

Author: Chahana Dahal

This proposal introduces an innovative AI-powered learning system designed to address educational disparities worldwide. Focused on developing countries, the system seamlessly translates educational content between English and native languages, breaking down language barriers. Leveraging advanced natural language processing and machine learning techniques, including transformer models like BERT and GPT-3, the system ensures inclusivity, effectiveness, and engagement. Built on prior research demonstrating AI's efficacy in language translation and personalized learning, the proposed system draws inspiration from successful projects like Duolingo Language Incubator. By providing inclusive and accessible learning experiences, it empowers individuals to overcome language barriers, fostering global participation. The potential impact is significant, with the system poised to accelerate learning, enhance literacy rates, and create a more skilled workforce in developing countries. This research reflects a commitment to revolutionize education through technology, aiming for lasting and transformative contributions to global society. Through AI-driven education, a brighter, more inclusive future is envisioned.

#8 Enhancing Robotics with Cognitive Capabilities [PDF1] [Copy] [Kimi]

Author: Joseph Fatoye

In the pursuit of creating more effective and adaptable robots, the flourishing field of cognitive robotics has arisen to infuse machines with human-like cognitive functions. This paper delves into the significance of cognitive robotics and charts a course for empowering robots with advanced cognitive capabilities. Drawing inspiration from current research in cognitive architectures, the paper underscores the importance of refined perception, language processing, complex decision-making, emotional intelligence, and cognitive synergy. By integrating these cognitive functions into robotic systems, the goal is to equip robots to operate intelligently in dynamic environments, collaborate seamlessly with humans, and adeptly handle diverse tasks. The proposed enhancements mark crucial strides towards the development of more versatile and capable intelligent robots.

#9 Using Reinforcement Learning to Iteratively Construct Road Networks from Satellite Images and GPS Data [PDF] [Copy] [Kimi]

Author: Isaiah Gallardo

Constructing road networks manually is a time consuming and labor-intensive process. This paper proposes a new method to iteratively construct road networks using reinforcement learning from a combined tensor-based representation of satellite image and GPS trajectory data.

#10 Statistically Principled Deep Learning for SAR Image Segmentation [PDF] [Copy] [Kimi]

Author: Cassandra Goldberg

This paper proposes a novel approach for Synthetic Aperture Radar (SAR) image segmentation by incorporating known statistical properties of SAR into deep learning models. We generate synthetic data using the Generalized Gamma distribution, modify the U-Net architecture to encompass statistical moments, and employ stochastic distance losses for improved segmentation performance. Evaluation against traditional methods will reveal the potential of this approach to advance SAR image analysis, with broader applications in environmental monitoring and general image segmentation tasks.

#11 A Novel Approach for Longitudinal Modeling of Aging Health and Predicting Mortality Rates [PDF] [Copy] [Kimi]

Author: Hannah Guan

Aging is a complex stochastic process that affects healthy functioning through various pathways. In contrast to the more commonly used cross-sectional methods, our research focuses on longitudinal modeling of aging, a less explored but crucial area. We have developed a Stochastic Differential Equation (SDE) model, at the forefront of aging research, designed to accurately forecast the health trajectories and survival rates of individuals. This model adeptly delineates the connections between different health indicators and provides clear, interpretable results. Our approach utilizes the SDE framework to encapsulate the inherent uncertainty in the aging process. Moreover, it incorporates a Recurrent Neural Network (RNN) to integrate past health data into future health projections. We plan to train and test our model using a comprehensive dataset tailored for aging studies. This model is not only computationally cost-effective but also highly relevant in assessing health risks in older populations, particularly for those at high risk. It can serve as an essential tool in anticipating and preparing for challenges like infectious disease outbreaks. Overall, our research aims to improve health equity and global health security significantly, offering substantial benefits to public health and deepening our understanding of the aging process.

#12 Multimodal Ensembling for Zero-Shot Image Classification [PDF] [Copy] [Kimi]

Author: Javon Hickmon

Artificial intelligence has made significant progress in image classification, an essential task for machine perception to achieve human-level image understanding. Despite recent advances in vision-language fields, multimodal image classification is still challenging, particularly for the following two reasons. First, models with low capacity often suffer from underfitting and thus underperform on fine-grained image classification. Second, it is important to ensure high-quality data with rich cross-modal representations of each class, which is often difficult to generate. Here, we utilize ensemble learning to reduce the impact of these issues on pre-trained models. We aim to create a meta-model that combines the predictions of multiple open-vocabulary multimodal models trained on different data to create more robust and accurate predictions. By utilizing ensemble learning and multimodal machine learning, we will achieve higher prediction accuracies without any additional training or fine-tuning, meaning that this method is completely zero-shot.

#13 Vision-Language Models for Robot Success Detection [PDF] [Copy] [Kimi]

Author: Fiona Luo

In this work, we use Vision-Language Models (VLMs) as a binary success detector given a robot observation and task description, formulated as a Visual Question Answering (VQA) problem. We fine-tune the open-source MiniGPT-4 VLM to detect success on robot trajectories from the Berkeley Bridge and Berkeley AUTOLab UR5 datasets. We find that while a handful of test distribution trajectories can train an accurate detector, transferring learning between different environments is challenging due to distribution shift. In addition, while our VLM is robust to language variations, it is less robust to visual variations. In the future, more powerful VLMs such as Gemini and GPT-4 have the potential to be more accurate and robust success detectors, and success detectors can provide a sparse binary reward to improve existing policies.

#14 Transforming Healthcare: A Comprehensive Approach to Mitigating Bias and Fostering Empathy through AI-Driven Augmented Reality [PDF1] [Copy] [Kimi]

Author: Erica Okeh

The integration of Artificial Intelligence (AI) into Augmented Reality (AR) for medical applications is propelled by the aim to address evident healthcare disparities. Certain communities have encountered disparities in medical diagnoses, exemplified by Black individuals exhibiting a 2.4 times higher likelihood of schizophrenia diagnosis compared to their white counterparts (Faber et al., 2023). These disparities often arise from structured interview assessments overlooking cultural nuances, resulting in increased misdiagnosis rates. This study leverages AI and AR to develop unbiased diagnostic tools and enhance empathy in healthcare professionals' training. Uniquely prioritizing the reduction of biased language and the fostering of empathy through AI-driven Natural Language Processing (NLP) and AI-driven virtual patients, the research aims to enhance diagnostic accuracy while promoting cultural sensitivity among healthcare professionals. Aligned with broader goals of achieving equitable healthcare and reducing disparities, the evaluation involves pre- and post-training assessments to measure language improvements and empathy enhancements. Successful implementation could lead to a more equitable healthcare landscape, fostering trust in AI-driven systems and ensuring fairer medical care for diverse communities.

#15 Defog Artificial Intelligence Glasses: Neural Networks for the Imperfect Real World [PDF] [Copy] [Kimi]

Author: Nilton Rojas

This research investigates the generalization capabilities of neural networks in deep learning when applied to real-world scenarios where data often contains imperfections, focusing on their adaptability to both noisy and non-noisy scenarios for image retrieval tasks. Our study explores approaches to preserve all available data, regardless of quality, for diverse tasks. The evaluation of results varies per task, due to the ultimate goal of developing a technique to extract relevant information while disregarding noise in the final network design for each specific task. The aim is to enhance accessibility and efficiency of AI across diverse tasks, particularly for individuals or countries with limited resources, lacking access to high-quality data. The dedication is directed towards fostering inclusivity and unlocking the potential of AI for wide-spread societal benefit.

#16 Multi-world Model in Continual Reinforcement Learning [PDF] [Copy] [Kimi]

Author: Kevin Shen

World Models are made of generative networks that can predict future states of a single environment which it was trained on. This research proposes a Multi-world Model, a foundational model built from World Models for the field of continual reinforcement learning that is trained on many different environments, enabling it to generalize state sequence predictions even for unseen settings.

#17 AI-Enhanced Art Appreciation: Generating Text from Artwork to Promote Inclusivity [PDF] [Copy] [Kimi]

Author: Tanisha Shende

Visual art facilitates expression, communication, and connection, yet it remains inaccessible to those who are visually-impaired and those who lack the resources to understand the techniques and history of art. In this work, I propose the development of a generative AI model that generates a description and interpretation of a given artwork. Such research can make art more accessible, support art education, and improve the ability of AI to understand and translate between creative media. Development will begin with a formative study to assess the needs and preferences of blind and low vision people and art experts. Following the formative study, the basic approach is to train the model on a database of artworks and their accompanying descriptions, predict sentiments from extracted visual data, and generate a paragraph closely resembling training textual data and incorporating sentiment analysis. The model will then be evaluated quantitatively through metrics like METEOR and qualitatively through Turing tests in an iterative process.

#18 Adapted Weighted Aggregation in Federated Learning [PDF] [Copy] [Kimi]

Author: Yitong Tang

This study introduces FedAW, a novel federated learning algorithm that uses a weighted aggregation mechanism sensitive to the quality of client datasets, leading to better model performance and faster convergence on diverse datasets, validated using Colored MNIST.

#19 Deep Learning for Style Transfer and Experimentation with Audio Effects and Music Creation [PDF] [Copy] [Kimi]

Author: Ada Tur

Recent advancements in deep learning have the potential to transform the process of writing and creating music. Models that have the potential to capture and analyze higher-level representations of music and audio can serve to change the field of digital signal processing. In this statement, I propose a set of Music+AI methods that serves to assist with the writing of and melodies, modelling and transferring of timbres, applying a wide variety of audio effects, including research into experimental audio effects, and production of audio samples using style transfers. Writing and producing music is a tedious task that is notably difficult to become proficient in, as many tools to create music both cost sums money and require long-term commitments to study. An all-encompassing framework for music processing would make the process much more accessible and simple and would allow for human art to work alongside technology to advance.

#20 Validation, Robustness, and Accuracy of Perturbation-Based Sensitivity Analysis Methods for Time-Series Deep Learning Models [PDF] [Copy] [Kimi]

Author: Zhengguang Wang

This work undertakes studies to evaluate Interpretability Methods for Time Series Deep Learning. Sensitivity analysis assesses how input changes affect the output, constituting a key component of interpretation. Among the post-hoc interpretation methods such as back-propagation, perturbation, and approximation, my work will investigate perturbation-based sensitivity Analysis methods on modern Transformer models to benchmark their performances. Specifically, my work intends to answer three research questions: 1) Do different sensitivity analysis methods yield comparable outputs and attribute importance rankings? 2) Using the same sensitivity analysis method, do different Deep Learning models impact the output of the sensitivity analysis? 3) How well do the results from sensitivity analysis methods align with the ground truth?